AITopics | ap 25

Few-Shot 3D Point Cloud Object Detection (FS3D) is a challenging task, aiming to detect 3D objects of novel classes using only limited annotated samples for training.

artificial intelligence, machine learning, prototype, (13 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.05)
Asia > China > Guangdong Province > Shenzhen (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Uni3DETR: Unified 3D Detection Transformer

Neural Information Processing SystemsOct-8-2025, 23:25:56 GMT

Existing point cloud based 3D detectors are designed for the particular scene, either indoor or outdoor ones.

artificial intelligence, detection, machine learning, (17 more...)

Neural Information Processing Systems

Country:

South America > Brazil (0.04)
Asia > China > Hong Kong (0.04)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Vision (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation

Chharia, Aviral, Gou, Wenbo, Dong, Haoye

arXiv.org Artificial IntelligenceSep-3-2025

While significant progress has been made in single-view 3D human pose estimation, multi-view 3D human pose estimation remains challenging, particularly in terms of generalizing to new camera configurations. Existing attention-based transformers often struggle to accurately model the spatial arrangement of keypoints, especially in occluded scenarios. Additionally, they tend to overfit specific camera arrangements and visual scenes from training data, resulting in substantial performance drops in new settings. In this study, we introduce a novel Multi-View State Space Modeling framework, named MV-SSM, for robustly estimating 3D human keypoints. We explicitly model the joint spatial sequence at two distinct levels: the feature level from multi-view images and the person keypoint level. We propose a Projective State Space (PSS) block to learn a generalized representation of joint spatial arrangements using state space modeling. Moreover, we modify Mamba's traditional scanning into an effective Grid Token-guided Bidirectional Scanning (GTBS), which is integral to the PSS block. Multiple experiments demonstrate that MV-SSM achieves strong generalization, outperforming state-of-the-art methods: +10.8 on AP25 (+24%) on the challenging three-camera setting in CMU Panoptic, +7.0 on AP25 (+13%) on varying camera arrangements, and +15.3 PCP (+38%) on Campus A1 in cross-dataset evaluations. Project Website: https://aviralchharia.github.io/MV-SSM

artificial intelligence, machine learning, pose estimation, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/CVPR52734.2025.01082

2509.00649

Country: Europe > Switzerland (0.28)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Supplementary Material for " Learning Superpoint Graph Cut for 3D Instance Segmentation " Le Hui

Neural Information Processing SystemsAug-19-2025, 17:43:43 GMT

This supplementary material provides more details on network architecture, visualization, and ablation study of our method. We also analyze the limitation and discuss the impact of our method. D, we discuss the limitations and impacts of our method. The produced feature dimension of 3D U-Net is 32. Finally, we can obtain 32-dimensional super-point features.

artificial intelligence, machine learning, segmentation, (14 more...)

Neural Information Processing Systems

Country: Asia > China > Jiangsu Province > Nanjing (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Direct Multi-view Multi-person 3D Pose Estimation (Supplementary Material) Tao Wang

Neural Information Processing SystemsAug-15-2025, 01:24:03 GMT

Figure S1: (a) Illustration of the proposed hierarchical query embedding and the input-dependent query adaptation schemes. It consist of a self-attention, a projective attention and a feed-forward network (FFN) with residual connections. Fig. S1 (a) illustrates our proposed hierarchical query The decoder of MvP transformer consists of multiple decoder layers for regressing 3D joint locations progressively. Fig. S1 (b) demonstrates the detailed architecture of a decoder layer, Results are shown in Table S1. Table S1: Results of replacing camera ray directions with 2D coordinates in RayConv.Positional Input AP We further investigate the effectiveness of the proposed projective attention by comparing it with the dense dot product attention, i.e., conducting Results are given in Table S2.

pose estimation, projective attention, supplementary material, (13 more...)

Neural Information Processing Systems

Country: Asia > Singapore (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.51)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.45)

Add feedback

Filters

Collaborating Authors

ap 25

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

076a93fd42aa85f5ccee921a01d77dd5-Paper-Conference.pdf

Uni3DETR: Unified 3D Detection Transformer

ef0af61ccfba2bf9fad4f4df6dfcb7c3-Supplemental-Conference.pdf

6da9003b743b65f4c0ccd295cc484e57-Supplemental.pdf

Appendix A Theoretical Derivation of P-V AE

Prototypical Variational Autoencoder for Few-shot 3D Point Cloud Object Detection Weiliang T ang

Uni3DETR: Unified 3D Detection Transformer

MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation

Supplementary Material for " Learning Superpoint Graph Cut for 3D Instance Segmentation " Le Hui

Direct Multi-view Multi-person 3D Pose Estimation (Supplementary Material) Tao Wang